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Abstract 

A key problem in the application of first-order 
probabilistic methods is the enormous size of 
graphical models they imply. The size results from 
the possible worlds that can be generated by a do¬ 
main of objects and relations. One of the reasons 
for this explosion is that so far the approaches do 
not sufficiently exploit the structure and similar¬ 
ity of possible worlds in order to encode the mod¬ 
els more compactly. We propose fuzzy inference 
in Markov logic networks, which enables the use 
of taxonomic knowledge as a source of imposing 
structure onto possible worlds. We show that by ex¬ 
ploiting this structure, probability distributions can 
be represented more compactly and that the rea¬ 
soning systems become capable of reasoning about 
concepts not contained in the probabilistic knowl¬ 
edge base. 


Introduction 


Many real-world reasoning problems require the combina¬ 
tion of relational representations with inference mechanisms 
that can solve the problems by reasoning from incomplete, 
ambiguous, inaccurate and even contradictory information. 
Examples of suc h reasoning tasks are the in terpretation of 
natural-language i Beltagy & Mooney, 20141, object recog¬ 
nition for robot perception 1 Nyga, Balint-Benczedi, & Beetz, 
20141 or intent reco gnition in human-robot interaction I Suk- 
thankar et al., 20141 . 

First-order probabilistic models | |Getoor et ai, 2007| have 
great potential to serve as powerful problem-solving tools for 
such application domains; joint probability distributions over 
the instantiated relations that describe the possible worlds in 
the respective domain can be queried for any aspect Q con¬ 
tained in the model given any evidence E, P{Q \ E). 

These powerful reasoning capabilities, however, come at 
the cost of computational complexity in learning and reason¬ 
ing as the size of the domain under consideration grows. As a 
consequence, practical applications are mostly bound to small 
application domains with limited complexity. Many knowl¬ 
edge systems, however, have to work in open worlds: they 
are equipped with knowledge bases (KB) that have to answer 


queries about unseen situations that have not been accounted 
in their design, such as the examples mentioned above. 

Hence, the application of expressive probabilistic represen¬ 
tation methods requires the inference mechanisms to support 
off-domain reasoning - reasoning about concepts that are not 
explicitly represented in the KB. Most of the probabilistic 
models, however, do not support off-domain reasoning. They 
require every symbol subject to reasoning to be explicitly rep¬ 
resented. On the other hand, learning a probability distribu¬ 
tion with all possible concepts is hopelessly infeasible. 

We therefore aim at developing reasoning mechanisms 
that are able to rapidly yet flexibly generalize and learn 
from very few examples, which has also been identified as 


key features in human cognition iTenenbaum et al, 2011 


Bailey, 19971. An obvious idea to tackle this is to take into 


account knowledge about the taxonomic structure of the rea¬ 
soning domain, which is captured by ontological knowledge 
representations such as description logics. To this end, the 
correlation between the semantic similarity of concepts, and 
the similarity of their relational structure can be exploited 
for reasoning in probabilistic relational models to transfer the 
learned knowledge to classes unseen in the training data. 

We propose FuzZY-MLNs as a probabili stic reasoning 


framework for Ma rkov logic networks (MLN) I Richardson & 
Domingos, 2006]. FuzZY-MLNs exploit the semantic sim¬ 


ilarity of concepts in a taxonomy in order to handle off- 
domain concepts in previously unseen situations in a mean¬ 
ingful way and hence allow efficient generalization from very 
sparse data whilst the original representation formalism of 
MLNs remains unchanged. The key idea of FuzZY-MLNs 
is to learn joint probability distributions conditioned on large 
taxonomic knowledge bases that are assumed to be given as 
factual knowledge. Indeed, a number of comprehensive high- 
quality taxonomies exist that have been carefully designed to 


reflect the se mantic similarity of concepts I Fellbaum, 1998, 
Lenat, 19^5), which we use as an implementational basis. In 


contrast to existing probabilistic methods incorporating class 
hierarchies, FuzZY-MLNs do not target reasoning about the 
taxonomic structure as such. This comes with the advantage 
that the concepts subject to reasoning do not need to be ex¬ 
haustively modelled in the probabilistic KB. This enables (1) 
compact representation of knowledge, (2) powerful general¬ 
ization from sparse training data and (3) reduced complexity 
of learning and inference. In particular, our contributions are 




























the following: 

1. We present an approach for reasoning about unknown 
concepts by exploiting semantic similarity to known 
concepts in Markov logic, which typically impedes a 
compact representation of classes that are hierarchically 
organized in a taxonomy. 

2. We propose a reasoning framework for MLNs that en¬ 
ables inference in presence of vague evidence, which 
allows a very compact representation of knowledge in 
MLNs and learning from sparse data. 

3. We demonstrate the strengths of FuzZY-MLNs by the 
example of word-sense disambiguation and showcase its 
strong generalization abilities. 

Running Example 

Let word-sense disambiguation (WSD) and semantic role la¬ 
belling (SRL), which are widely studied problems in natural- 
language processing, be our running examples. Solving these 
problems enables software systems to interpret incomplete 
and ambiguous instructions and transform them into well- 
defined action specifications. More specifically, take the 
terms ‘cup’ and ‘milk’ and their usage in the two instructions 
‘fill a cup with milk’ and ‘add a cup of milk’. In the former 
case, ‘cup’ refers to a drinking mug, a physical object that can 
hold milk. In the latter case, it refers to a measurement unit 
specifying the amount of milk to be added to something not 
further specified. 



Figure 1: Excerpt of the WordNet taxonomy of concepts for 
the ‘containers-&-liquids’ example. 

Figure shows a small excerpt of the WordNet taxonomy 
of possible word senses covering this example. Using the tax¬ 
onomy we can represent the two instructions using the fol- 


lowing logical assertions 

instruction 1: 

instruction 2: 

instance-of(Fill, fill-sense) 
is-a{fill-sense,fill.v.01) 
instance-of{cup, cup-sense f) 
is-a{cup-sense^, cup.n.Ol) 
instance-of{milk, milk-sense) 
is-a{milk-sense, milk.n.01) 
sem_role[cup, goal) 
sem_role{milk, theme) 

instance-of {Add, add-sense) 
is-a{add-sense, add.v.Ol) 
instance-of {cup, cup-sense 2 ) 
is-a{cup-sense 2 , cup.n.Ol) 
instance-of{milk, milk-sense) 
is-a{milk-sense, milk.n.01) 
sem_role{cup, amount) 
semMle{milk, theme). 


The assertions assign a word sense (instance-of) to each 
word. The word sense is linked to the taxonomy using the 
is-a predicate. In addition, the predicate sem_role states the 
semantic role that the word takes in the instruction, whether 
it is the object acted on, the source of the stuff to be trans¬ 
ferred, the destination, the action verb, and so on. 

Now suppose we have a taxonomy and two examplary in¬ 
structions to learn from: ‘fill a cup with milk’ and ‘add a cup 
of milk’. For the sentence ’fill water into the pot’ a probabilis¬ 
tic reasoner should infer that water is the stuff to be added and 
the pot the destination, even when ’water’ and ’pot’ are not 
contained in the probabilistic knowledge base. The reason is 
that ’water’ is a liquid like ’milk’ and therefore semantically 
similar and that a ’pot’ is also a container and therefore simi¬ 
lar to a cup. Current first-order probabilistic reasoning frame¬ 
works cannot perform this pattern of reasoning as they are 
restricted to concepts contained in their probabilistic knowl¬ 
edge base. 

In the following sections we will explain how we can ex¬ 
tend MLNs to perform such reasoning tasks. Note that the 
reasoning tasks we are interested in are not whether or not 
two concepts are similar. This is already asserted in the taxon¬ 
omy. We rather want to infer the concepts that entities belong 
to and the role they take in actions. 

Foundations 

Before defining FuzZY-MLNs we first introduce the for¬ 
mal groundwork they are based on: Description Logics (DL), 
Fuzzy Logics (FL) and Markov logic networks (MLN). 

Markov Logic Networks Our basic formalism for repre¬ 
senting, learning, and reasoning about first-order probabilis¬ 
tic knowledge bases are MLNs. Formally, an MLN L is 
given by a set of pairs {Fi,Wi), where Fi is a formula in 
first-order logic (FOL) and Wi is a real-valued weight. For 
each finite domain of discourse D, a ground Markov random 
field (MRF) can be instantiated by introducing to the MRF a 
Boolean variable for each ground atom and a binary feature 
{0,1} for each ground formula Fj, whose value 
for a possible world x € A” is 1 if the respective ground for¬ 
mula is satisfied in x and 0 otherwise, and whose weight is 
Wj. The ground MRF specifies a probability distribution over 
the set of possible worlds X according to 

1 \ 

P{X = x) = - exp \^^Wjfj{x)\ , (1) 

where Z is a normalization constant and G is anjndexed set 
of weighted ground formulas, i.e. a set of pairs {Fj,Wj) con¬ 
taining a pair {Fj , Wj=Wi) for every ground formula Fj of the 
formula Fi, and fj is the feature associated to the j-th pair. 

Description Logics The formulas in our probabilistic KBs 
are not independent of each other. Rather there are many con¬ 
straints between them. For example, if an entity e is an in¬ 
stance of the concept Cup then there might be another entity 
e' such that the relation holds{-, ■) holds for the pair (e, e'). 








































i.e. the assertion holds{e, e') must hold. DL are appropri¬ 
ate representation mechanisms to state such relations. In DL, 
these constraints are asserted as terminological axioms of the 
form c = exp. In our case, we can assert, for instance. Cup = 
Container □ 3 holds.Liquid 13 3 has.Handle in order to state 
that the concept of a cup is the intersection of the concept of 
a container that has a handle and holds some liquid. For the 
purpose of this work it is important to note that the concept 
that is defined ‘inherits’ the constraints from the concepts it 
is defined with forming a taxonomy relation C. Therefore, 
the similarity of the relational structure of concepts is highly 
correlated with their distance in the concept taxonomy. T de¬ 
notes the set of all concepts in C. 


Semantic similarity The semantic similarity of two con¬ 
cepts in DL-based representations can be characterized in 
terms of the relative location of the two concepts in the tax¬ 
onomy. Popular measures take into account the lengths of the 
shortest paths between two concepts in the respective taxon¬ 
omy graph. The shorter the paths connecting the two nodes 
in the graph are, the more similar the respective concepts are 
assumed t o be. Among those si milarity measures, the WUP 
similarity |Wu & Palrner, 1994) T x T i—>■ [0,1] is the 
most widely used. It dehnes the semantic similarity on con¬ 
cepts in a class taxonomy as ci cs := 
where lcs{-,-) denotes the least common super-concept of 
two concepts in C. 


Fuzzy Logic As we want to reason about concepts that are 
not contained in our probabilistic model, we need represen¬ 
tational means to express our expectations about the prop¬ 
erties of an unknown concept, which we are uncertain of. 
To do this, we intend to replace the binary truth values in 
MLNs with degrees of beliefs about whether or not relations 
hold for a concept not contained in the probabilistic model. 
We use fuzzy logic (FL) for this purpose, a multi-valued ex¬ 
tension of propositional logic (PL). FL has its foundations 
in the theory of fuzzy sets, in which elements belong to a 
set only to a certain degree. Formally, a fuzzy subset x of a 
set X is a pair {X, tt^), where X is called the universe and 
TTx : X 1 -^ [0,1] determines the degree to which a particular 
element actually belongs to x, which is called the membership 
function. In FL, the universe X is given by the set of atomic 
propositions and is a fuzzy interpretation of X assigning 
every proposition in X a real-valued degree of truth. It pro¬ 
vides a calculus analogous to the calculus of PL; If A and 
B are propositions in FL, then the logical connectors with 
respect to x are defined as A A B := min(7ra;(A),7ra;(i?)), 
Ay B :=max (7r2^(A), TixiB)), and ^A := 1 — tt^IA). Note 
that the multi-valued logical calculus of FL reduces to its bi¬ 
nary counterpart of PL in the extreme cases where all propo¬ 
sitions have boolean truth values. 

Fuzzy-MLNs 

A Fuzzy-MLN F is a pair {L, C), where L is an MLN and 
3 is a taxonomy of concepts, such that L represents a condi¬ 


tional probability distribution 

P {instance-off, •),... | is-af, •), ...). (2) 

In addition, the following conditions hold: 

1. an entity e in the domain of discourse D is connected to 
a concept c in the taxonomy 3 always by a proposition 

instance-of{e, s) A is-a{s, c), where s, c S T, 

2. all ground atoms of the form is-a{s, c), where s, c S T 
take real-valued degrees of truth S [0,1], which we call 
semantic similarity. Ground atoms of all other predi¬ 
cates take strictly binary truth values 3 {0,1}. 

3. The set X of possible worlds represented by F is the 
set of all fuzzy subsets of all ground atoms X, where the 
membership functions for every ground atom is-a(s,c) is 
equal across all possible worlds and is defined as the 
semantic similarity of s and c with respect to 3, i.e. for 
all X € X and for all s,c G T: Trx{is-a{s, c)) = s c. 

In the following, we motivate this definition in more detail. 

Probabilistic Semantics According to the second condi¬ 
tion in our definition, the semantics of FuzZY-MLNs dif¬ 
fers from the original in Equation 0 in two aspects: First, 
a possible world x is no longer a strictly binary vector as¬ 
signing a truth value to every ground atom but also allows for 
real-valued degrees of truth. The ground MRF of a FUZZY- 
MLN thus contains binary and numerical random variables; 
a real-valued variable for every ground atom of the form 
is-af, •) and a binary one for every other ground atom. Sec¬ 
ond, as a consequence, the semantics of the binary logical 
features fj-.Xi-y {0,1} in the ground MRF is not appli¬ 
cable any more. We therefore define the features associated 
to every ground formula Fj in the MRF to take the form 
fj : X I—[0,1], where each feature fj{x) evaluates to the 
truth value of its ground formula Fj in x by applying the 
fuzzy logic calculus as described above, i.e. fj{x) = Trx{Fj). 
Hence the distribution of F becomes 


WjT^x{Fj)j . (3) 

Condition no. 3 in our definition ensures that the probabil¬ 
ity distribution in Q corresponds to the conditional distribu¬ 
tion in 0: since the truth value of a ground atom of the is-a 
predicate is required to be equal across all possible worlds x, 
the distribution P{X=x) in 0 is effectively conditioned on 
every atom of the form is-a{-fy. 

A Fuzzy-MLN contains two dedicated predicates, 
instance-of and is-a, which provide means to incorporate 
knowledge from the class taxonomy into the probabilistic 
model. In short, is-a encodes the taxonomic knowledge and 
instance-of is used for expressing uncertainty about which 
categories entities belong to. By differentiating between the 
two predicates it can be modelled that one is certain about the 
taxonomic structure of the domain subject to reasoning but 
possibly uncertain about which concept an entity belongs to. 


1 

P(X = x) = - exp ( ^ 







Figure 2: Posteriors distributions over the taxonomy conditioned on semantic roles of a filling action according to Eq. More 
intense node colors indicate higher probability. Left: wi given sem_role(wi, theme) Right: W 2 given sem_role{w 2 , goc^ . 


In contrast to MLNs, FuzZY-MLNs do not require all 
predicates to be boolean. Variables (ground atoms) of the 
form is-a{s, c) in the ground MRF take real-valued degrees of 
truth G [0,1], which express the degree to which s is similar 
to c. Here, a value of 1 denotes maximal similarity, whereas 0 
denotes maximal dissimilarity. This allows to represent enti¬ 
ties that belong to concepts not contained in the probabilistic 
knowledge base by referring to them in terms of their simi¬ 
larity to known concepts. Note that in FuzZY-MLNs, the se¬ 
mantic similarities do not have to be computed by probabilis¬ 
tic inference as in other formalisms such as PSL. Rather, they 
are always given by the taxonomy structure and exclusively 
appear as evidence. This makes the representation of the con¬ 
ditional distribution in © very compact, since the taxonomy 
structure may be collapsed into single numeric values, which 
scale the contribution of every single ground formula to the 
probability mass 0 by the similarities of its constituents to 
concepts that are contained in the model. This allows to gen¬ 
eralize the learned knowledge also to classes unseen in the 
learning data. In addition, realizing FuzZY-MLNs without 
having to equip them with the capability of reasoning about 
the similarity relation as such, enables us to escape a 
complexity monster. Without making this restriction, infer¬ 
ence and learning would require us to compute integrals over 
those variables, rendering computational complexity infeasi¬ 
ble for practical applications. 

Since the distribution of a FUZZY-MLN is conditioned 
on the taxonomic structure of the domain, the second pred¬ 
icate, instance-of is used to link any entity in the domain 
of discourse to a concept in C. Unlike is-a, instance-of is 
boolean and may be subject to inference. Propositions about 
class memberships of an entity e are made in the form 
instance-of{e, s) A is-a{s, c). 

Let a minimalistic example illustrate how inference about 
unknown concepts can be achieved in FuzZY-MLNs: Sup¬ 
pose we want to represent the conditional distributions that 
parrots can fly and that mammals can not. In Markov logic, 
we can establish these distributions in an MLN with, for ex¬ 
ample, the two weighted formulas 


wi = ln(0.9/0.1) fliesie) A instance-of{e,parwt.n.Ol) 

A is-a{parrot.n.01,parwt.n.Ol) 

W 2 = ln(0.1/0.9) fliesie) A instance-of{e, mammal.n.Ol) 

A is-aimammal.n.Ol, mammal.n.Ol). 

In classical MLNs, reasoning can only be performed about 
instances of either of the concepts parrot.n.Ol and mam¬ 
mal.n.Ol because for any other concept, none of the for¬ 
mulas is applicable. Using a FuzZY-MLN with the same 
model structure and an underlying taxonomy C, however, 
we can tackle reasoning tasks outside the model domain, 
such as P{flies{Fred) \ instance-ofiFred, turkey.n.Ol)). In this 
example, there are two ground atoms of the is-a predi¬ 
cate, is-a{turkey.n.01,parrot.n.Ol) and is-a(turkey.n.01, mam¬ 
mal.n.Ol), which are, for instance, assigned the truth values 

TV j^iis-afurkey.n.Ol, parrot.n.Ol)) — 0.90 
TT,j,iis-aiturkey.n.01, mammal.n.Ol)) = 0.01 

in every possible world x according to a similarity Con¬ 
sequently, the influence of the two ground formulas 

Fi = fliesiFred) A instance-of {Fred, turkey.n.Ol) 

A is-a{turkey.n.01,parrot.n.Ol) 

F 2 = flies{Fred) A instance-of {Fred, turkey.n.Ol) 

A is-a{turkey.n.01, mammal.n.Ol) 

on the distribution in ([^ is scaled down by the similarity of 
concepts. In the extreme case, where there is maximal dissim¬ 
ilarity of two concepts, the contribution of every ground for¬ 
mula vanishes resulting in a uniform distribution. This is rea¬ 
sonable since we cannot make any well-informed statement 
about entities that are maximally dissimilar to everything that 
is contained in the model. 


Running example continued Let us now continue with our 
running example and explain how FuzZY-MLNs solve the 
respective reasoning tasks. We consider again the two training 
databases corresponding to the instructions (1) ‘All a glass 
with milk’ and (2) ‘add a cup of milk’. In order to model word 
sense and role/sense co-occurrences, we construct a Fuzzy- 


































































Figure 3: Fi scores for inverse fc-fold cross validation for k = '^/9.. .^/i using classical MLNs with FOL semantics and 
Fuzzy-MLNs applied to a WSD problem of 20 examples per action verb. 


MLN consisting of one single weighted template formula, 

instance-of{wi, si) A is-a{si, +ci) 

A instance-of{w 2 , S 2 ) A is-a{s 2 , +C2) 

A sem_role{wi,-\-ri) A sem_role{w 2 ,+r 2 ) A wi 7 ^ W 2 , 

which has been trained with the two databases introduced at 
the beginning. 

In order to illustrate that the learned MLN indeed reason¬ 
ably generalizes across classes, we visualize the posterior 
distributions over the WordNet taxonomy for two exemplary 
queries. Figure]^ shows the posteriors of two queries for the 
meaning of a word representing the theme of a ‘filling’ activ¬ 
ity and its goal, respectively, i.e. 

pk instance-of{wi, si), instance-of{w',fill.v.01), 

I instance-of{w 2 , S2) sem_role{w', action_verb), 

sem-role{wi, theme), (4) 

sem_role{w 2 , goal), 
is-a(fill.v.01 ,fill.v.01),... 

The distributions show that, conditioned on the semantic role 
of a word, two clearly separable clusters of concepts loom in 
the taxonomy. For the theme role of a hlling action, all sub¬ 
stances/liquids gain considerably high probability, whereas 
the goah of such an action are represented by all types of 
containers. Note that also categories not explicitly modelled, 
such as water.n.06, soup.n.Ol, or spoon.n.Ol and bowl.n.03 
and glass.n.02, respectively, have been assigned signihcant 
probability masses indicating that the model indeed resonably 
generalizes across object categories. 

Experiments 

We evaluate our method by comparing its performance 
against classical MLNs with FOL semantics being applied 


to the problem of word sense disambiguation. We use a real- 
world data set of natural-language instructions that have been 
mined from the wikihow.com web site and manually anno¬ 
tated with correct word senses. We take into account sense 
co-occurrences and part-of-speech tags. The MLN thus only 
contains one single template formula, 

has_pos{wi, -|-pi) A has_pos{w 2 , -I-P 2 ) 

A instance-of{wi, si) A is-a{si, -|-ci) 

A instance-of{w 2 , S 2 ) A is-a{s 2 ,-\-C 2 ) A wi 7 ^ W 2 - 

In order to showcase the generalization capabilities of 
Fuzzy-MLNs, we chose the hardest experimental setup 
we can imagine: (1) the datasets have been selected to ex¬ 
hibit maximal entropy with respect to the concepts that 
are contained in the examples, so that they are as dissim¬ 
ilar as possible, and (2) the model was trained with only 
very small portions of training data. We conduct ‘inverse’ 
fc-fold cross-validation, a modihcation of traditional cross- 
validation, where also inverse proportions of training and test 
set sizes are considered. For fc = 1/9 , for example, we use 
only 10% of the data available for training the model, and 
the remaining 90% serve for evaluation. Conversely, fc = 9/i 
corresponds to classical 10-fold cross validation. 

We group the instructions with respect to the action verbs 
they contain and use 20 examples per action verb in each 
fold. The results are shown in Figure|^and|^ Fuzzy-MLNs 
clearly outperform the classical MLNs in almost every test 
case. Moreover, Fuzzy-MLNs achieve Fi scores signifi¬ 
cantly above 0.5 even with very small portions of train¬ 
ing data (cmp. ‘hlling’ with only 10% of the data). The Fi 
score measures the classihcation accuracy wrt. word mean¬ 
ings from the taxonomy assigned to each word in the respec¬ 
tive NL instruction. It is interesting to note that, while only 
moderate improvements in classical MLNs are recorded with 






















































Action Verb 

1/9 

2/8 

3/7 

Ye 

k 

s/s 

6/4 

7/3 

8/2 

9/1 

Filling 

FOL 

0.40 

0.41 

0.41 

0.42 

0.44 

0.49 

0.44 

0.46 

0.51 

Fuzzy 

0.64 

0.69 

0.67 

0.68 

0.75 

0.75 

0.75 

0.75 

0.75 

Adding 

FOL 

0.27 

0.29 

0.29 

0.32 

0.29 

0.34 

0.38 

0.36 

0.38 

Fuzzy 

0.44 

0.50 

0.49 

0.51 

0.49 

0.52 

0.57 

0.56 

0.57 

Slicing 

FOL 

0.28 

0.30 

0.30 

0.31 

0.31 

0.34 

0.34 

0.34 

0.34 

Fuzzy 

0.36 

0.49 

0.54 

0.48 

0.60 

0.56 

0.61 

0.61 

0.65 

Cutting 

FOL 

0.27 

0.28 

0.29 

0.29 

0.32 

0.32 

0.34 

0.34 

0.34 

Fuzzy 

0.40 

0.51 

0.57 

0.62 

0.64 

0.64 

0.66 

0.66 

0.66 

Putting 

FOL 

0.42 

0.43 

0.43 

0.44 

0.46 

0.45 

0.46 

0.53 

0.55 

Fuzzy 

0.43 

0.48 

0.48 

0.50 

0.53 

0.50 

0.51 

0.51 

0.50 

Stirring 

FOL 

0.16 

0.16 

0.16 

0.16 

0.16 

0.16 

0.16 

0.16 

0.16 

Fuzzy 

0.53 

0.79 

0.73 

0.77 

0.76 

“0:83“ 

“0:83“ 

0.82 

0.82 


Figure 4: Left: Fi scores averaged over all action verbs. Right: Fi scores for inverse fc-fold cross validation for k = 1/9 ... 9/i. 


increasing amounts of training databases, the most signifi¬ 
cant performance jumps with FuzZY-MLNs can be observed 
when only sparse training data is used. In these extreme cases, 
where concepts occur in the test data that are not contained in 
the training data, classical MLNs (and all other approaches 
mentioned in the related work) are inapplicable to perform 
meaningful reasoning but are forced to randomly guess. This 
shows that fuzzy inference in MLNs can perform adequate 
reasoning about concepts in the taxonomy that are not explic¬ 
itly represented in the probability distribution and have not 
been seen during training. 


Related Work 

A couple of frameworks have been proposed to incorporate 
concept taxonomies and similarity in p robabilistic models. 


such as probabilistic description logics [Lukasiewicz, 2008 
Niepert, Noessner, & Stuckenschmidt, 2011] (PPL), tractable 


Markov logic (TML) I Domingos & Webb, 201 2| and prob¬ 
abilistic simi larity logic (PSL) IBrocheler, Mihalkova, & 
Getoor, 2010] , which differ from FuzZY-MLNs in basically 
two fundamental ways: (1) FuzZY-MLNs do not postulate 
uncertainty among the taxonomy structure as such, i.e. the 
structure itself is not subject to reasoning and (2) FUZZY- 
MLNs do not model the whole taxonomy in the probabilistic 
model, but only the concepts seen during training. This makes 
Fuzzy-MLNs a more compact reasoning framework. TML 
is a subset of Markov logic networks. TML introduces the 
idea of concept taxonomies in MLNs, but in order to perform 
reasoning about superclasses, the inheritance relationship of 
concepts is explicitly represented in the model. By employ¬ 
ing semantic similarity as evidence, the taxonomy relation 
is more compactly encoded in Fuzzy-MLNs. PSL uses a 
formalism similar to Fuzzy-MLNs. Unlike Fuzzy-MLNs, 
however, the goal of PSL is rather to reason about degree to 
which a set of entities are similar to each other. Conversely, 
in Fuzzy-MLNs the taxonomy is fixed and serves for filling 
gaps in the probabil istic KB. Hybrid MLNs (HMLN) I Wang 
& Domingos, 200^ extend MLNs to reason about continuous 
variables. They discern features in hard FOL and numeric fea¬ 
tures that may be expressed as ‘soft’ (in)equality constraints. 
Those constraints are typically connected in a multiplicative 
way, such that, if a logical constraint evaluates to false, then 


also a connected numeric feature will have no influence on the 
probability of the respective possible world. Hence represent¬ 
ing semantic similarities in HMLNs doe s not appear straight- 
forward. The concept of soft evidence I Jain & Beetz, 20101 
is closely related to the idea of vague evidence, though it has 
fundamentally different semantics for it still assumes boolean 
truth values and soft evidences serve as prior probability con¬ 
straints on ground atoms. 

To the best of our knowledge, none of these approaches can 
deal with entities that are not part of the probabilistic model in 
any meaningful way. This is a severe limitation, because they 
are not capable of exhaustively modelling joint probability 
distributions of realistic domain sizes. Since learning in first- 
order probabilistic models remains intractable in the general 
case, inference and generalization across concepts is essential 
and outstandingly important for probabilistic relational mod¬ 
els to be scalable and applicable to real-world problems. 


Conclusions 

In this work, we have described the design and the imple¬ 
mentation of Fuzzy-MLNs, an extension of MLNs that al¬ 
lows us to represent probability distributions over open do¬ 
mains compactly - if complete ontologies are available for 
these domains. The basic idea underlying Fuzzy-MLNs is 
to explicitly represent only the small subset of concepts that 
is contained in the training databases. After having learned 
the probability distribution Fuzzy-MLNs can reason about 
concepts that are not contained in the graphical model but in 
the taxonomy. They do so by exploiting the fact that the rela¬ 
tional structure of concepts in the taxonomy is correlated with 
the relational structures of the explicitly represented concepts 
weighted by a notion of semantic similarity. Fuzzy-MLNs 
implement this bias by generalizing the is-a assertions for off- 
domain concepts from boolean truth to real-valued degrees of 
truth. The degree of truth is then computed based on the se¬ 
mantic similarity of the off-domain concept to those concepts 
contained in the graphical model. 

We have shown that Fuzzy-MLNs can perform different 
probabilistic reasoning tasks in a way that matches our intu¬ 
itions and can outperform probability distributions learned in 
the ordinary MLN framework both significantly and substan¬ 
tially. 
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